Multi-view Acoustic Feature Learning Using Articulatory Measurements
نویسندگان
چکیده
We consider the problem of learning a linear transformation of acoustic feature vectors for phonetic frame classification, in a setting where articulatory measurements are available at training time. We use the acoustic and articulatory data together in a multi-view learning approach, in particular using canonical correlation analysis to learn linear transformations of the acoustic features that are maximally correlated with the articulatory data. We also investigate simple approaches for combining information shared across the acoustic and articulatory views with information that is private to the acoustic view. We apply these methods to phonetic frame classification on data drawn from the University of Wisconsin Xray Microbeam Database. We find a small but consistent advantage to the multi-view approaches combining shared and private information, compared to the baseline acoustic features or unsupervised dimensionality reduction using principal components analysis.
منابع مشابه
Kernel CCA for multi-view learning of acoustic features using articulatory measurements
We consider the problem of learning transformations of acoustic feature vectors for phonetic frame classification, in a multi-view setting where articulatory measurements are available at training time but not at test time. Canonical correlation analysis (CCA) has previously been used to learn linear transformations of the acoustic features that are maximally correlated with articulatory measur...
متن کاملAcoustic feature learning using cross-domain articulatory measurements
Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions. One limitation of this prior work is that the learned feature models are difficult to port to new datasets or domains, and articulatory data is not available for most spee...
متن کاملArticulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition
This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, whic...
متن کاملCombining acoustic and articulatory feature information for robust speech recognition
The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...
متن کاملMultiview Representation Learning via Deep CCA for Silent Speech Recognition
Silent speech recognition (SSR) converts non-audio information such as articulatory (tongue and lip) movements to text. Articulatory movements generally have less information than acoustic features for speech recognition, and therefore, the performance of SSR may be limited. Multiview representation learning, which can learn better representations by analyzing multiple information sources simul...
متن کامل